Using Feature Structures to Improve Verb Translation in English-to-German Statistical MT
نویسندگان
چکیده
SCFG-based statistical MT models have proven effective for modelling syntactic aspects of translation, but still suffer problems of overgeneration. The production of German verbal complexes is particularly challenging since highly discontiguous constructions must be formed consistently, often from multiple independent rules. We extend a strong SCFG-based string-to-tree model to incorporate a rich feature-structure based representation of German verbal complex types and compare verbal complex production against that of the reference translations, finding a high baseline rate of error. By developing model features that use source-side information to influence the production of verbal complexes we are able to substantially improve the type accuracy as compared to the reference.
منابع مشابه
Modeling verbal inflection for English to German SMT
German verbal inflection is frequently wrong in standard statistical machine translation approaches. German verbs agree with subjects in person and number, and they bear information about mood and tense. For subject–verb agreement, we parse German MT output to identify subject–verb pairs and ensure that the verb agrees with the subject. We show that this approach improves subject-verb agreement...
متن کاملDiscontinuous Verb Phrases in Parsing and Machine Translation of English and German
In this paper, we focus on the verb-particle (V-Prt) split construction in English and German and its difficulty for parsing and Machine Translation (MT). For German, we use an existing test suite of V-Prt split constructions, while for English, we build a new and comparable test suite from raw data. These two data sets are then used to perform an analysis of errors in dependency parsing, word-...
متن کاملIdentifying main obstacles for statistical machine translation of morphologically rich South Slavic languages
The best way to improve a statistical machine translation system is to identify concrete problems causing translation errors and address them. Many of these problems are related to the characteristics of the involved languages and differences between them. This work explores the main obstacles for statistical machine translation systems involving two morphologically rich and under-resourced lan...
متن کاملIssues in Translating Verb-Particle Constructions from German to English
In this paper, we investigate difficulties in translating verb-particle constructions from German to English. We analyse the structure of German VPCs and compare them to VPCs in English. In order to find out if and to what degree the presence of VPCs causes problems for statistical machine translation systems, we collected a set of 59 verb pairs, each consisting of a German VPC and a synonymous...
متن کاملHow to Account for Idiomatic German Support Verb Constructions in Statistical Machine Translation
Support-verb constructions (i.e., multiword expressions combining a semantically light verb with a predicative noun) are problematic for standard statistical machine translation systems, because SMT systems cannot distinguish between literal and idiomatic uses of the verb. We work on the German to English translation direction, for which the identification of support-verb constructions is chall...
متن کامل